by G. Julián Cervera Leonetti
The following exploratory data analysis was conducted with Python 3 for The Rainforest Alliance through the Center for Integrated Natural Resources and Agricultural Management (University of Minnesota, Twin Cities). The graphs and descriptions are based on a single DataFrame, originally in .xls format, whose unit of analysis is the products and services offered by a total of 79 small and medium-sized enterprises (known in Spanish as PyME, which stands for “pequeñas y medianas empresas”). All these companies operate in the Guatemalan Region of Petén and are partnered with RA (assumed to refer to The Rainforest Alliance) or with Asociación de Comunidades Forestales de Petén (ACOFOP). The period covered is 2018—2022 and the most relevant information pertains to the total sales of timber and non-timber forest products in USD, among other products and services. There is also information about the labor input required for each of the products and services, which could be analyzed in a potential second phase of the exploration. The following graphs focus on analyzing sales through the lenses of all the various categories into which the products and services can be classified. The full Python 3 code can be found and independently run here.
As you explore the graphs below, make sure to zoom into the different segments (if possible) and to click and double click on the figures’ legends to isolate variables and categories. By double clicking in the center of a figure, the zoom will be reset. By double clicking on the legends, all variables and categories will be shown again. Extra information will appear by hovering the cursor over the graphs.
Graph 1 shows three ways of subdividing the more than $24MM sold by the 79 PyMEs included in the DataFrame. Except for the middle pie chart, the labels are self-explanatory. Readers are encouraged to click on the graph’s legend to make categories appear and disappear, which allows comparisons between selected categories. To improve the readability of the graph, two possible values for “Value chain stage” were left out of the middle pie chart due to their small size in comparison with the larger categories: “commercialization” and “not specified”, which were both less than 0.5% of the total pie. The graph clearly shows the dominance of timber products, primary transformation products, and certified producers. Timber’s share of the sales is fives times greater than that of non-timber forest products.
Graph 2’s inner ring functions just like a pie chart, since it shows how sales are subdivided by product or service category. Its outer ring shows at what stages of the value chain each category’s products or services were sold, and what portion of each category’s sales occurred at the various value chain stages (unless an inner ring category is selected, the percentages on the graph’s outer ring refer to the total, and not just to a given inner ring category). Readers are encouraged to click on the inner ring categories that they are most interested in, and hover over the graph with their cursor to reveal the amounts in USD and to read the names of particularly thin "slices". 90% of non-timber forest products correspond to primary production activities, while only 21% of timber products do. 78% of the timber products correspond to primary transformation.
Graph 3 shows the 21 value chains included in the DataFrame, sorted by sales in descending order. The largest value chain by far is “Madera Aserrada de Petén”, which can be translated as “Chopped Wood from Petén”. Readers are encouraged to hover over the bars to get the exact amount sold in USD. If readers want to compare the lowest-selling value chains, they can also change the y-axis scale and zoom in by clicking and holding while they move their cursor over the y-axis labels and the dark grey area respectively. The only value chain without sales, "Molduras y componentes de la RBM", was included in the graph, but it can be easily removed by pulling the x-scale to the right. Of those who had sales, the lowest-selling value chain is "Actividades de traspatio en Petén", with only $317, followed by Hortalizas Petén, with $1,899.
Graphs 4 and 5 compare the sales of the same sorted list of products and services. The information is split into two consecutive charts for better visualization. The number of bars in each chart differs. In congruence with all graphs so far, timber products have the highest sales.
Graphs 6 and 7 are consecutive in the same way as graphs 3 and 4. In this case, the comparison is between PyMEs, and the total amount sold is subdivided into value chain stages. Zoom into and hover over the bars for more information. Click on the chart’s legend to isolate variables of interest. The lower-selling PyMEs tend to focus on primary production, while the highest-selling PyMEs present a more mixed sales composition, with a focus on primary transformation. If extended, the exploration could include the calculation of an adapted version of the Gini Coefficient to describe the level of inequality among companies in terms of their total sales. Note: graph 6 includes many more companies than graph 7.
Graphs 8 and 9 can be used and interpreted similarly to graphs 6 and 7. Timber products are evidently dominant among the highest-selling companies, while lower-selling companies tend to focus on “other activities”, and, to a lesser extent, on services and non-timber forest products.
As shown in graphs 10 and 11, the “urban” component of sales is much larger among the higher-selling companies. The exact meaning of the urban/rural distinction was not provided, but it is a binary variable that changes from product to product.